Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: copy_behaviors to make sub-classing easy #3137

Merged
merged 4 commits into from
Jun 26, 2024
Merged

Conversation

Saransh-cpp
Copy link
Member

@Saransh-cpp Saransh-cpp commented May 30, 2024

XRef #2433

@lgray
Copy link
Contributor

lgray commented Jun 18, 2024

Any update on the tests for this PR?

@Saransh-cpp
Copy link
Member Author

Hi, this already has tests! I think I forgot to request a review.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to update the rapidjson submodule in this PR so that the PR doesn't try to change it:

image

src/awkward/_util.py Outdated Show resolved Hide resolved
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited this PR to remove the continue. (There was a continue in the previous function, unique_list, because it was following a memoization pattern that should fail early with minimal fanfare if an item is in the memo. The copy_behaviors function is different: not isinstance(key, str) and "*" not in key is part of the basic logic.)

Along the way, I noticed something more dangerous: copy_behaviors modifies the behavior in place, and if there's an error part-way through, it will leave the behavior in an intermediate state, which is likely invalid. At least it should not apply its changes until there's no chance of failure. But, better still, it can return new behaviors for the caller to update, rather than change them in place. This is a different API:

ak.behavior.update(ak._util.copy_behaviors(existing_class, new_class, ak.behavior)

The fact that the ak.behavior has to be mentioned twice is a feature: maybe the user wants the new behaviors in some different dict.

I'm still confused about why these two cases are excluded:

  • key is a single string, such as existing_class.__name__
  • key includes a "*", which indicates that a class may be inside of nested lists.

As it is, copy_behaviors would copy everything except the ak.Record subclass (e.g. SomeObject with methods) and its vectorized verions (e.g. ArrayOfSomeObject with vectorized methods). These will later be replaced by overloads, but so will some of the NumPy ufunc or Numba typing/lowering overloads—I don't know why these are singled out. But what copy_behaviors should do is determined by its use, and I haven't seen how it's intended to be used.

I see that you fixed the rapidjson directory, so this is ready to merge, if my confusion above is unfounded. Otherwise, you can remove the check for "*" not in key and add a special case for key == existing_class.__name__.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merging this first version of copy_behaviors now, but as I said above, I think it should do a different thing (not skip the keys it's skipping). That can come later.

@jpivarski jpivarski merged commit b458523 into main Jun 26, 2024
39 checks passed
@jpivarski jpivarski deleted the easy-subclassing branch June 26, 2024 16:58
@Saransh-cpp
Copy link
Member Author

Hi @jpivarski, thanks for the review and fixing the PR! Sorry for taking too long to respond here. The MWE below shows why I added the if condition in this PR -

import awkward as ak
import numpy

class SuperVector:
    def add(self, other):
        return ak.zip(
            {"x": self.x + other.x, "y": self.y + other.y},
            with_name="VectorTwoD",
            behavior=self.behavior,
        )

@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
print(ak.behavior)

@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
    pass

print(ak.behavior) 
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>
# 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>

ak.behavior.update(
    ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior)
)
print(ak.behavior) 
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDArray'> -> removing the star condition (the class is wrong)
# ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'> -> removing the string condition

Additionally, replacing the generic string condition with a more specific case -

if oldname != key:

leads to -

{'VectorTwoD': <class '__main__.VectorTwoDRecord'>, ('*', 'VectorTwoD'): <class '__main__.VectorTwoDArray'>, (<ufunc 'add'>, 'VectorTwoD', 'VectorTwoD'): <function <lambda> at 0x1050a7ba0>, 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>, ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>, (<ufunc 'add'>, 'VectorTwoDAgain', 'VectorTwoDAgain'): <function <lambda> at 0x1050a7ba0>, ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'>}

Could you please let me know what should be done here? I tested the current implementation with Coffea and everything works as expected. (I pointed out that there is a bug in copy_behaviors on Slack but that was a false alarm.)

@jpivarski
Copy link
Member

('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n')

You definitely want to check for the isinstance(key, str) case and not iterate over the string as though it were a tuple.

I was assuming that you'd copy all of the behaviors, including the "VectorTwoD" and the ("*", "VectorTwoD") ones, and then replace the ones that are different for the new class, which may be NumPy overloads, Numba overloads, or whole class overloads. That's why I was thinking you'd want this function:

def copy_behaviors(existing_class: typing.Any, new_class: typing.Any, behavior: dict):
    output = {}

    oldname = existing_class.__name__
    newname = new_class.__name__

    for key, value in behavior.items():
        if isinstance(key, str):
            if key == oldname:
                output[newname] = value
        else:
            if oldname in key:
                new_tuple = tuple(newname if k == oldname else k for k in key)
                output[new_tuple] = value

    return output

and then define the new class after having copied the behaviors:

class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)

ak.behavior.update(ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior))

class VectorTwoDAgain(VectorTwoD):
    pass

ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)

But now I see the problem. You were using the @mixin_class decorator to add both Array and Record forms of the class to ak.behavior, which is equivalent to assigning them directly into the ak.behavior dict, as I did above, but since it's a decorator, it has to be assigned exactly when the class gets defined. The current API for copy_behaviors takes the class object as an argument, so it has to be called after the class gets defined. That's why you can't copy all behaviors and then overwrite some of them: there are causality constraints on when they can be defined and when copy_behaviors can be called.

This is going to be a problem because the "VectorTwoD" and the ("*", "VectorTwoD") keys are not the only ones that get assigned by decorators: keys like (numpy.add, "VectorTwoD", "VectorTwoD") can also be defined by @ak.mixin_class_method, which has to be called at the time when the class is defined. (See ak.mixin_class_method and Mixin decorators.) If the developer writes a class using this instead of assigning a lambda expression, they'd be in just as much trouble as my example above.

The copy_behaviors function needs to not take the class object as an argument, and it should (as a best practice) be called before the class gets defined. In fact, the function doesn't even use the class objects, just their names; it should only ask for what it's going to need.

This should be enough:

def copy_behaviors(from_name: str, to_name: str, behavior: dict):
    output = {}

    for key, value in behavior.items():
        if isinstance(key, str):
            if key == from_name:
                output[to_name] = value
        else:
            if from_name in key:
                new_tuple = tuple(to_name if k == from_name else k for k in key)
                output[new_tuple] = value

    return output

which would then be used like this:

class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)

ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))

class VectorTwoDAgain(VectorTwoD):
    pass

ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)

or like this:

@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)

ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))

@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
    pass

ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)

The general strategy, for a developer who is using the copy_behaviors function, is to (1) copy everything and (2) replace what's different about this new class. The copy_behaviors API needs to be flexible enough to let them do that. Since some ways of defining behaviors are bound to a class decorator, constraining when they can be assigned, the copy_behaviors has to not depend on the class object, or else there will be a circular dependency.

@Saransh-cpp
Copy link
Member Author

Thank you for the detailed explanation! I went through it and have created a PR with the fix.

@nsmith-
Copy link
Contributor

nsmith- commented Jul 9, 2024

How should I interpret the fact this is in the _util module? Is it private? Is it ok to use in coffea?

@Saransh-cpp
Copy link
Member Author

copy_behaviors is definitely intended to be a public API. Maybe I should move the function to another module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants